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ABSTRACT 


The Internet is an evolving, robust system with built in redundaney to ensure the flow of infor¬ 
mation regardless of any aet of nature or man-made event. This makes mapping the Internet a 
daunting task, but important beeause understanding its strueture helps identifying vulnerabilities 
and possibly optimizing traffie through the network. We explore CAIDA’s and NFS’s probing 
methodologies to verify the assentation that NFS’s probing methodology diseovers eomparable 
Internet topologies in less time. We eompare these by modeling union of traeeroute outputs 
as graphs, and using standard graph theoretieal measurements as well as a reeently introdueed 
measurement. Ultimately, the researehers verified the NFS’s probing methodology was eom- 
parable to the CAIDA’s probing methodology. We also propose additional avenues for further 
exploration from our initial diseoveries. We also introdueed a teehnique that ean possibility 
identify stable eore existenee among the whole Internet and explore ease studies of two eountry 
sub-graphs. 
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Executive Summary 


The Internet is an evolving, robust system with built in redundancy to ensure the flow of infor¬ 
mation regardless of any act of nature or man-made event. This makes mapping the Internet a 
daunting task, but important because understanding its structure helps identifying vulnerabilities 
and possibly optimizing traffic through the network. We explore CAIDA’s and NFS’ probing 
methodologies to verify the assentation that NFS’ probing methodology discovers comparable 
Internet topologies in less time. We compare these by modeling union of traceroute outputs as 
graphs, and study the graphs by using vertex and edge count, average vertex degree, clustering 
coefficient and the Fearson coefficient. The results from these measures show CAIDA’s and 
NFS’s probing methdologies are compromable. However, using a recently introduced measure¬ 
ment, the probing methodologies actual discover up to 40 percent different sets of vertices and 
edges captured during almost simulanteous probing. Ultimately, the researchers verified that the 
NFS’s probing methodology was comparable to the CAIDA’s probing methodology. We also 
proposed additional avenues for further exploration from our initial discoveries. We introduced 
a technique that can possibily identify stable core existence. We conducted preliminary analy¬ 
sis on the interesection of six inferred topologies with promising results. We believe additional 
probing samples might display the stable core of the Internet. Additionally, we identified South 
Korea and China as skewed for the NFS probing methodology and conducted a case study of 
each. We analyzed using the standard graph comparision measures and the intersection to iden¬ 
tify a possible stable core. We observed only five percent of stable vertices in China but 40 
percent in South Korea. 
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CHAPTER 1: 
Introduction 


1.1 Communications Evolution 

Throughout time societies have constantly explored ways to communicate, despite vast dis¬ 
tances, either for control, correspondence, commerce or spread of knowledge. We build vast 
networks to efficiently and rapidly broadcast the information with the newest technology at the 
time, (e.g., smoke signals, pony express, and telegraph systems). Recently, the computer age 
has increased the available amount of information which has consumed the inferior methods of 
distribution before it. This led the United States government to commission a more robust and 
fault tolerant communications system during the 1960s making way for the Advanced Research 
Projects Agency Network (ARPANet) which eventually became the Internet. 

Unfortunately, with the decommission of the ARPANet, the Internet structure became decentral¬ 
ized, thus harder to map. The physical structure of the Internet became proprietary technology 
to different business organizations making it difficult to understand how truly information is 
shared. This has made it increasingly difficult for people to effectively and successfully map 
the topology of the Internet. Thus a common interest in understanding exactly how links/edges 
are assigned among routers/nodes has emerged, with the main goal of developing algorithms to 
track the topology. The current thesis will address this with the goal of measuring and compar¬ 
ing outputs of two such algorithms. 

1.2 Why Measure the Internet? 

Developing an approximate Internet map is important because having an understanding about 
how routers connect and interact with each other can lead to better security, or increased ef¬ 
ficiency of traffic flow. The algorithms developed to discover the topology of the Internet are 
useful in understanding how an adversary can limit the exchange of information or completely 
disable the accessibility of a local network. This is just one type of attack that is commonly re¬ 
ferred to as a denial of service (DoS) or similarly a distributed denial of service (DDoS) which 
targets a system by typically overloading it either through bandwidth or memory. This is a 
very effective method of disruption which countries and companies spend billions of dollars 
to prevent. With an understanding of the Internet topology, we can help to mitigate some of 
the bandwidth bottlenecking that DoS attacks target. Additionally, it could help us understand 
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where to create better redundancy in the network that will prevent accidental DoS by an organi¬ 
zation similar to DigitalGlobe’s recent request for users help to scour through vast amounts of 
satellite imagery to find the missing Malaysian Air flight 370 [1]. 

1.3 Research Question 

We began our research wanting to know the following questions. 

• What are the substantiative differences between the NPS probing methodology and the 
CAIDA probing methodology? 

• Is there any bias in the algorithms? 

• Does one algorithm geographically, by country, discover more of a network? 

We were limited on our research by the number of probing cycles available for analysis. We 
investigated the available probing cycles and made some educated inferences to how NPS and 
CAIDA algorithms compare to each other. 

1.4 Thesis Contribution 

We use existing graphical analysis to compare large graphs at a very course granularity. We 
did this to check similarity of the multiple probing cycles using the existing analysis tools, 
(i.e., average degree, vertice count, edge count and clustering coefficient). We then used a 
recently introduced concept of VSD and BSD to compare vertex to vertex and edge to edge how 
similar two probing cycles are to discovering the same topology. Next we found the intersection 
between cycles to identify the amount of the Internet that we call the Stable Core found by each 
probing methodology. We then compared CAIDA and NPS probing methodology’s Stable Core 
to discover similarities. We then compare sub-graphs (data divided by country) to find which 
probing methodology more accurately represents the Internet. We compared the number of 
vertices within each country and check if it represents the reported Internet users in [2] and 
country allocated IP space. 

1.5 Organization of Thesis 

In investigating the research question, this thesis is organized as follows: 

• Chapter 1 discusses the motivation of the research. 

• Chapter 2 discusses prior and related work in the fields of measuring the Internet. 
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• Chapter 3 introduces the machinery used in the analysis conducted in the course of the 
research. 

• Chapter 4 details the data used and the methodology that was developed. 

• Chapter 5 contains the results of case studies conducted using our measures. 

• Chapter 6 contains the summary and discusses possible areas for future work. 
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CHAPTER 2: 
Background 


In this chapter, we establish a base knowledge of the Internet and some key terminology. This 
will allow a common language for understanding how the Internet is formed, controlled and 
how it evolved in time. 

2.1 Overview of the Internet 

The Internet is a global system of connected routers which follow an agreed-upon standard of 
protocol suites. These protocols are established through the Internet Engineering Task Forces 
(lETFs) and published in Request for Comments (RFCs)^ that serve billions of users worldwide. 
The protocols are known as IPs, which are similar to a mailing address. Information is first 
packeted with the destination addresses and are routed by routers within Autonomous Systems 
(ASes) where information is then transmitted. The ASes are a group of IP prefixes, under central 
control of one or more network operators that presents a common, clearly defined routing policy 
to the Internet [4]. The routes are nothing more then a path connecting vertices (users) through 
ASes to each other. 

We view the Internet as a group of many symbiotic communities that operate as collective 
ASes. These ASes work together by connecting to each other forming larger networks. The 
AS is typically an Internet service provider (ISP) or other large organization with connections 
to multiple other ASes (e.g., Comcast, Verizon and universities). Each AS has an officially 
registered Autonomous System Number (ASN) and adheres to the RFCs to properly route the 
information. When multiple routers establish connections/edges with other routers, they build a 
routing table and share their table among all the connected routers. The routers distribute these 
tables in order to compute the most efficient paths that adhere to the businesses constraints of 
the providers. This occurs because the routers identify preferred routes between themselves to 
other routers/vertices. However, there exists a delineation between internal and external routers 
within an AS. Internal routers will only handle the traffic within their ASes and will refer to all 
ingress facing or transit routers for any connection outside their organic ASes. The internal AS 
routers can save IP space by allowing large companies to subdivide the company’s network to 
enable more users without affecting the ingress router, an expression called "piggybacking" or 

'Document series containing technical and organizational notes about the Internet [3]. 
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"subnetting." Likewise, all ingress/extemal routers will ignore any subnetting groups of large 
networks, namely their own AS and outside AS. The design of internal and ingress/external 
routers help to reduee the number of entries in the ingress/external transit routing tables. The 
router aeeomplishes this by delegating some roles to the internal routers within the ASes and 
other responsibilities to the ingress/external routers. 

In Figure 2.1 [5], the boundary routers (e.g., Rll, R12, R21 and R31 in Figure 2.1) are ingress 
routers that bridge the ASes. These routers are eommonly referred to as ingress, while internal 
routers sueh as R13, and R14 in Figure 2.1 handle the traffie inside their AS. Physieally, the 
routers are eonneeted, via their interfaees, and wired or wireless eonneetions between interfaees 
on other routers/deviees. A trivialized representation of the Internet is shown in Figure 2.1 
where the ASes are identified by their respeetive ASNs. In this illustration, the transit ^ edges 
eonneet ASN 1 to ASN 2 and ASN 1 to ASN3. 



^Edge between two external routers. 
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2.2 Internet Topology 

The true Internet topology is a difficult problem to truly represent due to the ever changing 
structure of the Internet, sheer size and evolution. When we map the Internet we are looking 
at a "snap shot" in time of what the Internet looked like. The research community hopes that 
taking multiple "snap shots" throughout time will allow the researcher to map and gain some 
insight to the Internet’s structure. For our research, what is more arduous is what these "snap 
shots" mean and how we compare them, before being able to predict how the Internet will act. 

2.2.1 Levels of Internet Topology 

There are numerous granularity levels we can study within the Internet, (e.g., fiber, IP address, 
router, and ASes). For each level, we represent a fictitious network with a corresponding graph¬ 
ical representation(s). We are particularly interested in interface-level mapping because it offers 
the most clarity of connectivity and if we have the IP level graph, then the others could be in¬ 
ferred. We limit our research to this interface-level and the connectivity of the segments of the 
Internet. Below we discuss the different levels for completeness. 

AS-level topology. At the AS level, the ASes are vertices and communication paths between 
them form edges between the vertices. This is seen more prominently as provider-customer and 
peer relations [6]. With this system, a customer pays a provider to connect to the Internet outside 
of its administrative domain. Then these providers contractually agree to exchange information 
traffic between each other’s customers mostly free of charge. Thus, the Intranets and their 
connections that the ISPs provide to its customers form the Internet between the administrative 
domains. This means that the AS level captures more of an economic relationship between ISPs 
rather than the physical connections between routers of the Internet. 

The AS-level graph representation of Figure 2.2a is shown in Figure 2.2b from [5]. 

Subnet-level topology. Subnet-level mapping [7] involves discovering the IP addresses that are 
hosted on the same subnet. These subnets are defined by the interfaces that they connect to. The 
subnet level topology has the subnets as the vertices and the routers as the edges connecting the 
various subnets. This methodology is shown in Figure 2.3 from [5]. 

Interface-level topology. Interface-level or IP level routing works similarly as subnet-level 
mapping because router interfaces and end hosts are captured. The connections between two 
router interfaces form an edge, while the interfaces themselves are the vertices. It is important to 
note how routers can have multiple interfaces, which could require multiple graphs to ensure the 
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(a) Network map. 



Figure 2.2: Autonomous System-level representations from [5]. 



(a) Network map. 



(b) Graph. 


Figure 2.3: Subnet-level representations from [5]. 


diseovery of all eonneetions/edges between the routers. Figure 2.4 [5] displays some variations 
of graphs depieting various perspeetives to different end points. 

Router-level topology. Routers have multiple interfaees, eaeh with a different IP address, 
and by IP Alias Resolution ^ they are eombined to represent one vertex in the Router-level 
graph. These vertiees are then eonneeted by edges, whieh represent a possible physieal link 
between the routers. These vertiees and edges form the graph used to represent the Internet. 

process to identify IPs which belong to the same router. 
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(a) Network map. 



(b) Graph of interfaces as seen (c) Graph of interfaces as seen (d) Graph of interfaces as seen 
from X. from Y. from Z. 




(e) Graph of interfaces as seen (f) Graph of interfaces as seen from 
fromR22. R31. 

Figure 2.4: Interface-level representations from [5]. 


The topology of these graphs formed from this method are more useful because the vertices 
represent the physical routers and the edges the physical links/edges of the actual physical 
network layout. However, IP Alias Resolution is not precise, although current research shows 
promise in solving this problem [8-10]. Thus, having an understanding at the IP level will help 


9 











generate an understanding of the Internet at the router level. 




(a) Network map. (b) Graph. 

Figure 2.5: Router-level representations from [5]. 


2.2.2 Acquiring Active Network Topology 

There are numerous Network Topology Capture (NTC) algorithms to aequire network topolo¬ 
gies ineluding, DIMES, IPlane, Ark IPv4 All Prefix /24 and reeently NPS probing methodol¬ 
ogy. NPS probing methodology is different from the others beeause it ineludes adaptive probing 
teehniques whieh leverage ingress knowledge and data from previous oyoles(s) to ehoose the 
best probe destination and assignment of vantage point to destination [11]. The idea is by ex¬ 
ploiting previous knowledge of the data, one ean reduee the eost, mainly time and additional 
traffie load on the Internet. NTC’s goal is to reduee the diseovery eosts (i.e., probes sent and 
time diseovery time), while ensuring maximum eoverage of the network [12]. However, there 
are limitations of NTCs from offline systems, firewalls or overloaded edges during the time of 
probing. In [13], the authors investigated if the time of day for a probing eyele matters. They 
did not find evidenee that the time of the day matters, but they eaution that their sampling size 
was small and additional researeh should be done. 

Ideally, we want to use ground-truth ^ to eompare the effeetiveness of NTC algorithms. Unfor¬ 
tunately, it is impossible to obtain ground-truth beeause many organizations will not share the 
information for seeurity reasons. An organization’s network graph ean expose seeurity flaws 
and allow the possibility of sabotage. Organizations like CAIDA provide datasets [14] from 
both aetive and passive measurement of the Internet that are readily available to the researeh 
eommunity. We will use some of CAIDA’s datasets throughout our researeh, whieh of eourse 
is not ground-truth, but it provides us data to eompare NPS’s probing methodology. 

^Actual existing graph of a real network. 
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2.2.3 Traceroute 

Traceroute is a network diagnostic tool that allows a user to identify how a network sends 
information between devices/vertices. The use of traceroute allows a user to assess a network to 
help identify and fix connection issues. Typically most traceroutes use Internet control message 
protocol (ICMP) and a Time to live (TTL) that increment at every point of routing. This is 
useful because after the completion of each trace, a history of the forward interface-level path 
and time to send and acknowledge are available to analyze. However, traceroute may not return 
all router path information because some routers are formated to reply anomalously. 

Paris traceroute improved traceroute by accommodating load balancing routers [15,16]. The 
improvement provides more accurate information to paths along a network because the load 
balancing routers have the ability to direct information or probes along different paths and hide 
the reality of the path. An example of Paris traceroute is provided in Figure 2.6 from [5]. 


Real Topology 


Source 



Destination 


Classic traceroute 
(possible outcome) 

Source 



Paris traceroute 

Source 



Destination 


Figure 2.6: Classic versus Paris traceroute adapted from [5,15]. 


Figure 2.6 illustrates how the Paris traceroute is preferred over the classic traceroute. The 
left diagram in Figure 2.6 provides the ground-truth of the network where router L provides 
load balancing across two paths. The middle diagram shows the representation of traceroute 
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result, while the one of the right is the Paris traeeroute. The Paris traeeroute eaptures a better 
representation of the real topology of the network. 

2.3 Existing Internet Topology Views 

There are many different views to the exaet strueture of the Internet and how to graphieally 
represent the Internet. In Figure 2.7, the Internet has a eore of fiber-optie eables eonneeting 
ISPs to the end users, (i.e., home and business users). 



Figure 2.7: Visualization of USA Internet from [17]. 


Another view many researehers share is that the Internet is a heuristieally optimal topology 
where the eore is sparse with low degree routers whieh eonneet to high degree edge providers 
then to hosts/end users. An example of this strueture is found in Figure 2.8a. 

Both of the views on the Internet strueture are eurrently aeeepted by researehers. We also 
wanted to show some previous graphieal representation of the Internet from aetual traeeroutes. 
Cheswiek has done eonsiderable researeh in developing an algorithm to display traeeroute in¬ 
formation obtained primarily from CAIDA. An example of some of his visualizations of the 
Internet are found in Figure 2.9. Cheswiek has named the Internet images as a "Peaeoek on a 
Windshield." Another Internet visualization example is found in Figure 2.10. This visual only 
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Intermountain GigaPoP 


Heuristically Optimal Topology 



^ ^ HTgh degree nodes are™the edges, f 


Arizona St. I / Front Range GigaPoP 


Northern Lights 
Indiana GigaPoP 

Starlight 
WiscREN 
NYSERNet 



North Texas GigaPoP ' 
Texas GigaPoP ' UT Austin 

Abilene Backbone 
Physical Connectivity 
(as of August 2004) 

0.1-0.5 Gbps 

- 0.5-1.0 Gbps 

- 1.0-5.0 Gbps 

- 5.0-10.0 Gbps 


DARPA BossNet 
Jackson St. 

U. Florida 
U. So. Florida 


(a) Network map. 

Figure 2.8: Router-level 


(b) Router-level topology of Abilene. 

representations from [18]. 


displays the baekbone of the Internet, but shows that even smaller subsets of the Internet are 
overwhelming to visually analyze. 



Figure 2.9: Cheswick map of Internet from [19]. 


We will model snapshots of the Internet by graphs to faeilitate its measurement. We will use 
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Figure 2.10: CAIDA IP map of Internet from [20]. 


python eoding (partieularly NetworkX) to analyze existing data eomparison algorithms. Addi¬ 
tionally, we will refer to CAIDA’s measuring tools to eompare large networks and we append 
additional tools. As previously mentioned, we will model the Internet at the interfaee-level 
using multiple "snap shots," with the a vertex representing an interfaee or IP and an edge repre¬ 
senting a eonneetion between two interfaees obtained from a traeeroute. 
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CHAPTER 3: 
Mathematical Background 


In this chapter, we introduce some mathematical theory that will create a common language 
as we refer to the Internet as a graph. We focus on set theory, graph theory and established 
complex graph comparison tools. These tools are helpful to compare large graphs because of 
ease of calculation and mathematically proven characteristics. Additionally, we will introduce 
a technique we believe will lead into understanding the stable core of the Internet. 

3.1 Set Theory Terminology and Terms 

A basic understanding in set theory will allow the reader to follow the results obtained from the 
measures of similarities and differences between graphs. We also use graph theory to understand 
the characteristics that large data sets exhibit, allowing further insight into the information’s 
meaning. We accomplish this by taking the large data sets and turning them into graphs. We 
then run existing measures in NetworkX^ to aid in understanding the data behind the graphs. 

The following definitions for Set Theory are found in [21]. The terminology and theory are a 
base knowledge and reference to understand our methodology and results. 

A set G is an unordered collection of objects, called elements or members of the set. A set is 
said to contain its elements. We write a G A to denote that a is an element of the set A. The 
notation a denotes that a is not an element of the set A. 

The union, A U 5, of the sets A and B, denoted by A U 5, is the set that contains those elements 
that are either in A or in B, or in both, see Figure 3.1. 

The intersection, A fi B, of the sets A and B, denoted by A fi B, is the set containing those 
elements in both A and B, see Figure 3.1. 

The generalized unions of a collection of sets is the set that contains those elements that are 
members of at least one set in the collection (e.g.. Figure 3.2). 

While the generalized intersections of a collection of sets is the set that contains those elements 
that are members of all the sets in the collection also as seen in Figure 3.2. 

^Python based software package for creation, manipulation, and study of the structure, dynamics and functions 
of complex networks. 
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Figure 3.1: Union and intersection from [21]. 



Figure 3.2: Generali z ed Union and intersection from [21]. 

The Symmetric Difference of A and B, A © 5, is the set containing those elements in exactly 
one of A and B, see Figure 3.3. 



Figure 3.3: Symmetric Difference A ©B. 


3.2 Graph Theory Terminology and Concepts 

The below definitions and concepts are found in [22]. Any additional terminology we will 
individually reference. 
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A graph G consists of a finite nonempty set V of objeets ealled vertices and a set E of 2-element 


subsets of V ealled edges, see Figure 3.4. The sets V and E are the vertex set and the edge set 
of G, respectively. Vertices are also called points or nodes and edges are sometimes called lines 
or arcs. For Figure 3.4, the vertex set is 


y(G) = {C1,C2,C3,C4,C5,C6,C7} 


and the edge set is 




G: 


Figure 3.4: Example graph. 


Node or vertex Count The vertex count of a graph G, commonly denoted |y (G) | or |G|, is the 
number of vertices in G. In other words, it is the cardinality of the vertex set. 

The edge Count of a graph G, commonly denoted M(G) or E{G) and sometimes also called 
the edge number, is the number of edges in G. In other words, it is the cardinality of the edge 
set. 

A path is used to describe both a manner of traversing certain vertices and edges of G and a 
subgraph consisting of the sequence of those vertices and edges. A path is a m — v walk in a 
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graph in which no vertices are repeated. An example of a path in Figure 3.4, from ci to C 4 is 


P= (C 1 ,C 3 ,C 4 ) 

and it is not unique since there also exist another path. 


P= (C 1 ,C 2 ,C 3 ,C 4 ) 


There are actually multiple paths from c\ to C4. 


A graph G is connected if every two vertices of G are connected, that is, if G contains a path 
M — V for every pair m, v of vertices of G. A graph G that is not connected is called disconnected. 


A trail is terminology borrowed from the Old West and defined asau — v trail in a graph G to be 
au — v walk in which no edge is traversed more than once. An example of a trail in Figure 3.4, 
from C 5 to C 2 is 

T = (C5,C4,C6,C7,C4,C2) 

and can repeat vertices as we did with C4. A circuit in a graph G is a closed trail of length 3 or 
more. Our previous example of a trail can be a circuit, C 5 ,..., C 5 , but another example is a circuit 
from Cl to C 6 is 

Cl = (ci,C7,C4,C2,C7,C6,Ci) 


but also 


C2 = (ci,C2,C3,C4,C7,C6,Ci) 


this shows a circuit is not unique between two vertices. A cycle is a circuit that repeats no 
vertex, except for the first and last. A k —cycle is a cycle of length k, (i.e., k vertices), for 
example a 3 — cycle is commonly referred to as a triangle. From Figure 3.4 an example of a 
3 — cycle is 

C3 = (ci,C5,C3,Ci) 


while an example of a 5 — cycle is 


C4 = (ci,C2,C3,C4,C7,Ci) 


The degree of a vertex v in a graph G is the number of edges incident with v. For example in 
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Figure 3.4 the degree of cq. 

deg C 7 = 4 

For two vertices u and v in a graph G, the distance d{u,v) from m to v is the length of a shortest 
u — v path in G. A M — V path of length d{u,v) is called au — v geodesic. The path ci, C 5 , C 4 is a 
geodesic from ci to C4. 

For a vertex v in a connected graph G, the eccentricity e(v) of v is the distance between v 
and a vertex farthest from v in G. The minimum eccentricity among the vertices of G is its 
radius and the maximum eccentricity is its diameter, with are denoted by rad{G) and diam{G), 
respectively. For example, e{c\) = 2 , rad{G) = 2 and diam{G) = 2. 

Symmetric difference [23] Conventionally, if G and H are graphs with vertex set V, then the 
symmetric difference GAH is the graph with vertex set V whose edges are all those edges 
appearing in exactly one of G and H. Note that the GAH the set operation of symmetric 
difference is done on the edges of the graph, and it is not what we use in this research. 

In this paper, we model Internet connections as an undirected graph. We do this because the 
nature of bi-directional information exchanged on the Internet between endpoints. Therefore, 
we will not use directed graphs, pseudo graphs or multi-graphs to represent the Internet from 
our data. 

3.3 Complex Network Measures 

In our analysis we use some measures that CAIDA has used in previous research papers [24,25] 
and have provided the definitions below. We later augment with other measures. 

Average vertex degree (k), is the ratio of edges to vertices, where n is the number of vertices 
in the graph and m is the number of edges in the graph: 

2 m 
k = —. 
n 

This is the average degree of all the degrees in the graph, and is derived from the first theorem of 
graph theory.^ This is considered one of the coarsest measurements for graph comparison, but 
serves as an easy reference when comparing two large graphs. Additionally, a topology whose 
graph has a larger average vertex degree is likely to be more efficient and robust than those of 

^The sum of degrees is twice the number of edges in the graph. 
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lower average vertex degree. We use v for vertices and k for the degree of the vertex. 
Average clustering coefficient for a graph G is the average, 



where n is the number of vertices in G [26]. 

Additionally, we will examine a few other measures: the transitivity [27], degree Pearson 
correlation coefficient [28], VSD and ESD [5]. 

Similar to clustering, transitivity [27] computes the fraction of all possible triangles present in 
G. This is accomplished by identifying the total number of triangles out of all possible triads. 

^ triangles 
# triads 

A triad is a path (see Section 3.2) on three vertices that has the possibility of being a triangle or 
a path of length three. 

We use the degree Pearson correlation coefficient to compute the degree assortativity of a 
graph, the pearson correlation coefficient is defined as 

^ ^ i 

M-' Ei ^uf+kf) - [M-‘UJi + ki)? 

where ji, ki are the degrees of the vertices at the ends of the zth edge, with i= 1, ...,M [29]. This 
is used to measure the similarity of connections in the graph with respect to the vertex degree 
of the hubs. An example of each is displayed in Figure 3.5. The assortative graphs, 3.5a, show 
a preference of hubs to link to each other. While a disassortative graph, 3.5c, the hubs seem to 
avoid each other. 

A recently introduced comparison [5], is the VSD and ESD, which measure the percentage of 
change between two graphs either in terms of the vertices or the edges. The below definitions 
and examples for VSD and ESD were taken from [5]. 

Definition 3.3.1. For two graphs G and H, the vertex symmetric difference vsd{G,H) is defined 
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\ {Pk. 

A \ 



(c) Disassortative 

Figure 3.5: Example graphs for degree Pearson correlation coefficient from [30]. 


to be: 


vsd{G,H) = 


|V'(G)\V(H)| + |y(H)\V(G)| 


|V(G)| + |y(H)| 

As example, consider the two graphs in Figure 3.6, for which we have: 




Figure 3.6: Example to illustrate vsd and esd between two graphs from [5]. 


vsd{G,H) = 


\V{G)\V{H)\ + \V{H)\V{G)\ 


1 + 1 2 

^ = — = 16.7% 
6 + 6 12 


|V'(G)| + |V(H)| 

Definition 3.3.2. For two graphs G and H, the edge symmetric difference esd{G,H) is defined 

\E(G)\E{H)\ + \E{H)\E(G)\ 


as 


esd{G,H) = 


|E(G)| + |£(H)| 


Informally, we first count the edges present in one graph and not the other and then reversed. 
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This is then normalized over the total number of edges in both the graphs, so that it is relative 
to the size of the graphs. 

In the case where graphs G and H have exactly the same edges, esd{G, H) = 0. If graphs G and 
H are disjoint, or have totally different edges, esd{G,H) = 1. 

Thus, we see that the esd of any two graphs, G and H, lies in the closed interval [0,1] (i.e., 
0<esd{G,H) < 1). 

On this scale, we are able to say, intuitively, if the difference between the two graphs (edge¬ 
wise) is significant. 

For example, consider the two graphs in Figure 3.6 [5] 

We have 

=-|£(G)| + |£(H)|-= ^ = l5 = 

In this thesis, we introduce the idea of a possible stable core. We hypothesis that the stable core 
can be found by taking the generalized intersections of multiple "snapshots" of the Internet in 
time. We notice the stable core measurement of the network is the part that has limited change 
per many cycles, in our case six cycles. This part might represent more of the backbone of the 
ever changing Internet. 
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CHAPTER 4: 

Data and Methodology 


In this chapter, we outline our data collection procedures and explain some known limitations 
of our data usage. The application to the mathematical concepts will follow in Chapter 5. 


4.1 Source of Data 

We used data from two topology collection methodologies, NPS and CAIDA, to accrue our two 
sets of topology probing cycles and ran the NPS algorithm two additional probing cycles to 
identify trends we observed in the previous probing cycles. We ran each collection technique 
simultaneously^ to negate any interference with time or network load and also used the same 
set of vantage points. The first set of probing cycles was run on 4 September 2013 followed by 
two additional pair of probing cycles on 13 December 2013. After initial results from the two 
sets of NPS and CAIDA probing cycles, we ran two additional NPS probing probing cycles to 
identify if we continued to observe certain trends that we will expand upon in Chapter 5. The 
dates for the probing cycles are shown in Table 4.1. 


Probing Cycle 

Date 

NPS 1 

8/31/13 

CAIDA 1 

9/4/13 

CAIDA 2 

12/13/13 

NPS 2 

12/15/13 

NPS 3 

4/4/14 

NPS 4 

4/9/14 


Table 4.1: The dates of the corresponding probing cycles. 


4.1.1 CAIDA data 

CAIDA uses an active probing methodology called Archipelago (Ark), which we will also call 
"CAIDA probing methodology" throughout the paper. CAIDA developed Ark to: 

• reduce the effort needed to develop and deploy sophisticated large-scale measurements 

^Runs were conducted back to back switching which probing method was conducted first. 
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• provide a step toward a community-oriented measurement infrastructure by allowing col¬ 
laborators to run their vetted measurement tasks on a security-hardened distributed plat¬ 
form [31]. 

Ark uses coordinated large-scale traceroute-based topology measurements from a process called 
team probing to gather measurement to all routed /24’s* [31]. The team probing dynamically 
divides the IP space among three teams to provide a parallelization of all /24’s destinations, 
and collects data for in about two to three days per team of 17-18 monitors. The teams operate 
independently to prove the entire IP address space. Currently, CAIDA has 86 active monitors, 
seen in Figure 4.1, to provide data collected in parallel, with at least one in every continent 
except Antarctica [32]. The Ark measurement uses scamper, a powerful and flexible active 
measurement tool which supports Internet protocol version 4 (IPv4), Internet protocol version 
6 (IPv6), traceroute and ping [31]. CAIDA uses scamper because it supports transmission 
control protocol (TCP), user datagram protocol (UDP), and ICMP based measurements and 
Paris Traceroute variations [31]. 

Ark collects the data by sending probes continuously from random monitor vantage points, 
within a team, to destination IP addresses. The destinations IP address prefixes are randomly 
selected among the /24 space to ensure the data has representation across 724 space. Using 
a tool, like sc_analysis_dump tool^ included in the scamper distribution package [33] we can 
extract the information needed for this research from the probing cycles. This will be analyzed 
in Chapter 5. 

4.1.2 NPS Data 

The NPS probing methodology is a Python script program with the goal of minimizing the time 
required to gather the network information while maximizing the number of vertices and edges 
discovered. The authors of the program call the technique of adaptive network mapping ingress 
point spreading (IPS). IPS aims to increase probing efficiency by first inferring the number of 
ingress points for a given network, then for each new probe, selecting the vantage point with 
the highest likelihood to traverse an ingress point that has not been covered before [34]. The 
probing methodology uses data from a prior probing cycle to infer potential ingress points at 
different network boundaries for each target prefix. The process is designed to 

• Discover the degree of subnetting within edge networks through and iterative interroga- 
^Approx 9.5 million addresses. 

®Tool that produces output in textual format of each summary trace. It is included in Appendix A. 
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Figure 4.1: The map of all CAIDA monitor loeations from [32]. 


tion proeess [34]. 

• Diseover sourees of path diversity into networks by finding and exploiting the target’s 
ingress points [34]. 

In [34], the authors state that by spreading probing aeross ingresses it will prevent early termi¬ 
nation and therefore diseover more diverse paths. An example of IPS is illustrated in Figure 4.2. 
In Figure 4.2a, six previous vantage points are displayed to various destinations, (in red). The 
desired /16 is shaded in red in Figure 4.2a and has three destination IPs and eneompasses two 
ingresses into the /8 that lead to paths into the 716 prefix. In Figure 4.2b, vantage points 1 and 
2 are ehosen as first priority in rank order list. Vantage points 1 and 2 are ranked first beeause 
eaeh traversed a diverse ingress in Figure 4.2a. 
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However, IPS wants to discover a total rank order over all vantage points, typically larger than 
these six example vantage points to discover more. Therefore, IPS expands from /16 to /15 
prefixes (green shaded box in Figure 4.2b), which includes a new ingress point from point 4. 
This new ingress point rank prioritizes point 4 third because vantage point 3 shared the same 
/8 ingress point as point 2 in Figure 4.2a. The IPS probing methodology continues this ranking 
through /14, /13, etc. until all vantage points are rank ordered. 



(a) Target /16 prefix with two ingresses. 



(b) Expansion to find notational ingresses. 


Figure 4.2: IPS example of six vantage points from [34]. 


4.2 Data Selection and Preparation 

In Section 4.1, we described the two different probing methodologies and how each discovers 
the Internet. We explained the algorithms to help the reader understand the difference in the 
probing methodologies. We now explain to the reader the preparation of each probing cycle, the 
approximate collection time and the time it takes to allow a computer to analyze the information 
via our measures. 

4.2.1 Preparation 

The output from sc_analysis_dump tool^® is a list of traceroute like data. Specifically, the 
data provides the information that resembles a classic IP trace as well as a round-trip time 
(RTT) of packets sent and received between routers along the way. We take the data from 
the sc_analysis_dump, strip the RTT information and keep the interface data. This provides a 
list of interfaces and the links between them (given by the sequence of IPs as in Figure 4.3). 

tool used to provide a list of traceroute data in a readable script. Each line contains information about tbe 
each single trace to include interfaces visited and time between transmission. Details of tbe output are seen in 
Appendix A. 
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We then use the router interfaee as the vertex and identify two eonseeutive IPs whieh will be 
represented by an edge. This is easily done allowing us to string the interfaees together in 
order, from the first vertex to the seeond via an edge and the seeond to third via another edge in 
sequenee to represent the path taken for eaeh traee. We eontinue this proeess through eaeh traee 
in the probing eyele to obtain several paths forming a union and disearding edge and vertex 
multiplieity. From the union of the paths we build a graph representing a "snapshot" of the 
Internet as diseovered by eaeh probing methodology. This teehnique allows us to eompare the 
graphs by the various methods deseribed in Chapter 3. It is important to note that all interfaees 
respond with their address, due to seeurity settings, and we diseard all "q" responses. We have 
suggested additional future work to identify the amount disearded and possible way of assigning 
the unknown routers to a geographieal set, in Seetion 6.2 

4.2.2 Resources "Time” 

We previously diseussed the three or more days needed to obtain a eomplete CAIDA eyele that 
we eall a "snapshot" of the Internet. We also briefly explained how one of the main goals of 
the NPS probing methodology was to diseover more of the Internet’s topology in less time. In 
order for us to best eompare CAIDA’s and NPS’s probing methodologies, we first eompared 
two probing eyeles of eaeh methodology eolleeted as near simultaneously as possible using the 
same monitors, (in order to negate any differenee due to the time eolleeted or traffle load on 
the Internet). This allowed us to better eompare the two probing eyeles to eaeh other. Eaeh set 
of probing eyeles took a day to two days to eolleet, then another day to two to analyze. We 
were able to aeeomplish this by using a dedieated server to run the eompiling, parsing and test 
eontinuously over the four to five day period. We also ran a few additional tests, speeifieally 
diameter and radius testing per eountry, whieh took nearly two weeks of dedieated time on the 
server per eyele. 

4.2.3 Probing Data Challenges 

The Internet is ever ehanging, making it diffieult to prediet and aeeurately map. Additionally, 
not all the routers probed respond, making it diffieult to know exaetly whieh routers are eap- 
tured. During both CAIDA and NPS probing methodologies, we would reeeive a "q" for eaeh 
unresponsive router, as seen in Figure 4.3. This meant our probe would identify a router’s ex- 
istenee but not its IP address. Unresponsive routers oeeur beeause many eompanies and users 
inerease the seeurity settings on their routers to bloek ICMP traffle. This makes obtaining aeeu- 
rate information inereasingly diffieult sinee in our researeh, we diseard the routers with a "q." 
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Since we discard the unresponsive routers and therefore their interfaces, some of the Internets 
subsets appear disconnected, when they really are connected. 
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Figure 4.3: Comparison of traceroutes with same source and destination addresses with "q" 
response from [5]. 


Additionally, we used MaxMind geoIP lite database^ ^ for geographic reference data base for IP 
addresses. This is problematic because there is not a method for us to confirm the exact location 
assigned to each IP address. 


4.3 Methodology 

In the current thesis, data from two probing methodologies (CAIDA vs. NPS) was obtained 
with the goal of using graph comparisons to differentiate their outcomes. We ran general stan¬ 
dard statistics on multiple cycles of each of the two probing methodologies, as well as the two 
recently introduced metrics of VSD and BSD. 

Additionally, we partitioned the data (from the probing cycles) into geographical countries to 
identify the locations of the additional vertices and edges discovered by NPS probing method¬ 
ology and not discovered by CAIDA probing methodology. We also analyzed the possibility of 

*' The exact method of how the database is populated is proprietary, but is known to do worse on router interfaces 
than client addresses. 
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any bias between the probing cycles to identify if monitor location caused the increase in found 
vertices and edges in some countries. 

This data was then analyzed by reapplying the same statistics per country, as well as a graphical 
analysis in Gephi^^, VSD and BSD, and stable core intersection since these graphs were much 
smaller. 


visual interactive graph modeler and analyzer. 
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CHAPTER 5: 
Results 


In this chapter, we explain our findings of the two topology probing methods using the pre¬ 
defined comparison methods from [24]. We first examine each probing method’s data as a 
comprehensive graph, then sub-divide the graph into geographic countries to identify if there is 
bias towards a country. We also explore some possible pitfalls when using certain methods to 
ensure awareness when comparing the graphs. 

5.1 General Topological Comparison Between NFS Probing 
Method and CAIDA Probing Method 

We analyzed our data holistically using the existing measures we explained in 3.2 and 3.3 whose 
validity was addressed in [25]. In Table 5.1, we identify that the NFS probing method discov¬ 
ers more vertices and edges compared to the CAIDA probing method. We also observe that 
the average vertex degree, for the first two probing cycle sets, are slightly less for the NFS 
probing method, but not significantly. We believe this was attributed to the discovery of 12 
percent more vertices and only 10 percent more edges, which will decrease the average de¬ 
gree. However, after two additional NFS probing cycles, the average vertex degree increases 
and is greater than the first two CAIDA probing cycles meaning additional probing cycles are 
needed. We then compared the average clustering coefficient to identify if it implies difference 
in the structure of the data obtained from the two probing methods. Our data, in Table 5.2, does 
not support a significant difference, meaning that the structure discovered by the two probing 
methods are comparable to each other. We explored the Fearson coefficient to identify any dif¬ 
ference between how the two probing methods relate degree correlations, because the clustering 
coefficients were similar. We discovered that the Fearson coefficient for NFS’s probing method 
was negative, but small, which means the graph is slightly disassortative. CAIDA’s probing 
method had a positive Fearson coefficient, but also very small, which would mean a slightly 
assortative graph. However, we cannot draw any statistically significant conclusions from our 
data to the meaning of these results, rather conjecture that NFS probing methodology is slightly 
disassortative while CAIDA probing method remains slightly assortative. As future work, we 
recommend the study of additional probing cycles and their comparisons in order to be able to 
draw significant conclusions. 
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Number Vertices 

Number Edges 

Average Node Degree 

NFS 1 

524,366 

1,360,855 

5.190 

CAIDA I 

466,072 

1,236,530 

5.306 

NFS 2 

520,906 

1,333,079 

5.II8 

CAIDA 2 

464,553 

1,202,778 

5.178 

NFS 3 

543,073 

1,441,963 

5.310 

NFS 4 

554,753 

1,470,550 

5.302 


Table 5.1: The basic statistics for each probing cycle. 



Average Cluster Coefficient 

Fearson Coefficient 

NFS I 

0.017 

-0.034 

CAIDA I 

0.017 

0.021 

NFS 2 

0.016 

-0.033 

CAIDA 2 

0.017 

0.039 

NFS 3 

0.020 

-0.032 

NFS 4 

0.019 

-0.035 


Table 5.2: The connectivity table for each probing cycle. 


5.1.1 Dissimilarity Measures 

We compared VSD and BSD to evaluate each edge and vertex per graph to determine how 
similar the graphs are per probing cycle between NFS and CAIDA probing methods. In Table 
5.3, we present the VSD between all probing cycles of NFS and CAIDA probing methods. We 
identify around 40 percent different interfaces/vertices between the first two NFS and CAIDA 
probing cycles, which were collected nearly simultaneously, see Table 4.1. This confirms the 
two probing methods discover different sets of IF of the Internet, independent of time collected. 
Table 5.3 also displays the difference between NFS 3 and NFS 4 probing cycles, which were 
taken nearly four months apart, see Table 4.1. This seems to confirm the ever changing Internet. 
Interestingly, the VSD between NFS 1 versus NFS 2 is nearly seven percent less then CAIDA 
I versus CAIDA 2, (recall NFS 1 and CAIDA I at the same time, as was NFS 2 and CAIDA 
2). We attribute the small difference to the randomness in target selection within /24’s for 
CAIDA versus the NFS directed strategy. Furthermore, the difference between all pairwise 
NFS’ probing cycles are smaller than they are to CAIDA’s probing cycles. This could mean that 
NFS’ probing method discovers more stable interfaces/vertices than CAIDA’s probing method. 
We will address additional possible stable core results in Section 5.1.2. 

The BSD between the probing cycles displays the edges/paths that the interfaces/vertices used 
to transmit the packet information between routers. Interestingly, there is nearly a 50 percent 
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NPS I 

NPS 2 

NPS 3 

NPS 4 

CAIDA I 

CAIDA2 

NPS 1 

0.00 

0.25 

0.34 

0.34 

0.37 

0.43 

NPS 2 

0.25 

0.00 

0.26 

0.26 

0.43 

0.37 

NPS 3 

0.34 

0.26 

0.00 

0.10 

0.48 

0.43 

NPS 4 

0.34 

0.26 

O.IO 

0.00 

0.48 

0.44 

CAIDA I 

0.37 

0.43 

0.48 

0.48 

0.00 

0.32 

CAIDA 2 

0.43 

0.37 

0.43 

0.44 

0.32 

0.00 


Table 5.3: The values of VSD per probing cycle. 


difference in the edges the packets took between the probing cycles comparing the two probing 
methodologies, shown in Table 5.4. The difference appears to hold regardless of the probing 
method. There is a slight outlier between NPS 3 and NPS 4 probing cycles which only differ by 
24 percent. We attribute this to the small time frame between the same probing methodology 
(four days between NPS 3 and NPS 4 probing methods versus the nearly four months between 
NPS 1, NPS 2, NPS 3/4, CAIDA 1 and CAIDA 2 probing cycles). 



NPS I 

NPS 2 

NPS 3 

NPS 4 

CAIDA I 

CAIDA 2 

NPS I 

0.00 

0.43 

0.54 

0.54 

0.41 

0.54 

NPS 2 

0.43 

0.00 

0.44 

0.44 

0.54 

0.42 

NPS 3 

0.54 

0.44 

0.00 

0.24 

0.62 

0.54 

NPS 4 

0.54 

0.44 

0.24 

0.00 

0.62 

0.55 

CAIDA I 

0.41 

0.54 

0.62 

0.62 

0.00 

0.46 

CAIDA 2 

0.54 

0.42 

0.54 

0.55 

0.46 

0.00 


Table 5.4: The values of BSD per probing cycle. 


5.1.2 Commonality Measures 

This subsection introduces the idea of measuring a stable core, by which we mean the part of 
the Internet that is discovered by the majority of the probing cycles, see Section 3.3. We used 
intersections of probing cycles to discover the graph (vertices and edges) that all probing cycles 
discovered. The basic statistics of the possible stable core are found in Table 5.5. We predicted 
that the number of vertices and edges would decrease to a smaller subset during every iteration, 
which our analysis supports. Interestingly, the vertex and edge counts still remain relatively 
large, about 40 percent of the original graphs. This shows promising evidence that there exist a 
relatively stable core, but we need to compare additional probing cycles before we can confirm. 
We believe that with this knowledge we could possibly identify the preferred crucial Internet 
infrastructure that is important to the speed of the Internet we currently enjoy and identify those 
paths that are critical for vulnerability assessments. 
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Node Count 

Edge Count 

Avg Degree 

Clustering 

Fearson 

Both CAIDA Graphs 

316,422 

659,788 

4.170 

0.011 

0.084 

First Two NFS 

392,791 

761,599 

3.878 

0.008 

-0.025 

Interseetion of NFS 1-2 
and CAIDA 1-2 

233,607 

432,412 

3.702 

0.007 

0.083 

Interseetion NFS 1-2, 
CAIDA 1-2 with NFS 3 

204,860 

334,689 

3.267 

0.005 

0.092 

Interseetion NFS 1-2, 
CAIDA 1-2 withNFS 4 

204,508 

333,928 

3.266 

0.005 

0.094 


Table 5.5: The graph statisties for the interseetion of graphs. 


We tested our eonjeeture of the existanee of a stable eore using the interseetion of the first 
four graphs, (NFS 1, NFS 2 and CAIDA 1 and CAIDA 2) and tested them on the last two 
probing eyeles, (NFS 3 and NFS 4). In Table 5.5, the test on both NFS 3 and NFS 4 both have 
a vertex eount differenee of only 352 vertiees/IFs with only an edge eount differenee of only 
761. This is signifieant sinee there are over 500,000 vertiees and over 1,200,000 edges for eaeh 
eyele. This shows that the interseetions of the graphs over time might lead to the diseovery of 
a stable eore. The researehers reeommend additional future probing eyeles to test to identify 
the rate of ehange and determine the number of probing eyeles to have statistieal data to ensure 
validity to our findings. We will also test the stable eore eonjeeture on smaller eountry graphs 
in Seetion 5.2.3. 

5.2 Per Country Topological Comparison Between NPS and 
CAIDA Probing Methods 

From our findings in Seetion 5, we parsed the eyeles to identify the eountries eaeh traversed 
to determine if bias existed. In Figure 5.1^^, both NFS and CAIDA probing methods diseover 
the same top 10 eountries for vertex and edge eount, but NFS’ probing method found signifi- 
eantly more IFs in the United States, China and South Korea than CAIDA’s probing method. 
In Figure 5.2, NFS probing method also did better at diseovering more edges in the United 
States, China and signifieantly better in South Korea. It makes sense for a probing method to 
probabilistieally find more vertiees and edges in China and the United States beeause both have 
more users than any other eountry [2]; see Table 5.8 and diseussion in Seetion 5.2.1 we will 
further diseuss. It is worth mentioning that there are not any monitors in China. That is, we 
analyzed the vantage point monitors, seen in Figure 5.3, to see if there was any diseovery bias 

^^The x-axis is by country aligned per probing iteration. 
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from monitor vantage point. Our data, from the monitor vantage point, does not support any 
monitor bias beeause they used nearly the same monitor vantage points with little effeet on IP 
diseovery per probing cyele. 


160000 

140000 B 

' United States 
120000 ; 
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®NPS3 
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Figure 5.1: The vertex eount per eountry. 

We eompared the elustering eoeffieient to test eonnectivity of the different cycles at the country 
level. In Figure 5.4, each probing method follows generally the same slope. The graph is 
skewed a little for NPS probing method because it does not find as many countries as CAIDA 
probing. Notice, however that NPS probing method still finds the same amount of connected 
countries. In fact, NPS found 23 percent of countries with a clustering coefficient of zero, while 
CAIDA finds 27 percent of countries with a clustering coefficient of zero. This means NPS 
probing method may actually be better at finding connectivity than CAIDA probing method. 

Next, we compared highest and lowest country clustering coefficient to determine which coun¬ 
tries were more strongly connected. We conjectured the clustering coefficient might identify 
if there was any trend to security or censorship. We first hypothesized that a country with a 
high clustering coefficient would show less censored countries, while a country with a low clus¬ 
tering coefficient would mean more censorship. We conjectured this because a country with a 
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CAIDA 1 

CAIDA 2 

NPS 1 

NPS 2 

NPS 3 

NPS 4 

United States 

129,161 

125,379 

138,127 

133,330 

136,636 

139,258 

China 

57,099 

57,528 

70,892 

71,687 

72,889 

74,013 

Japan 

40,211 

39,653 

41,177 

40,273 

41,514 

42,364 

South Korea 

30,889 

30,377 

38,283 

37,704 

39,547 

40,371 

Germany 

24,571 

23,679 

28,423 

28,437 

29,320 

29,843 

Great Britain 

23,781 

22,776 

24,475 

23,416 

24,962 

24,950 

Brazil 

22,904 

21,997 

24,377 

22,927 

24,491 

25,032 

Canada 

22,044 

21,156 

21,815 

20,298 

22,097 

22,672 

Russian Federation 

19,103 

18,023 

16,772 

15,142 

16,756 

17,031 

Italy 

16,108 

15,293 

16,621 

15,267 

15,769 

16,710 

Netherlands 

12,336 

11,644 

16,350 

16,216 

17,678 

18,591 


Table 5.6: Sample of the top vertex count by country. 
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Figure 5.2: The edge count per country. 


low clustering coefficient will show less of the observed Internet or only the external routers 
leading to paths. Unfortunately, our data did not support our hypothesis because our data did 
not recognize known heavily censored countries. Additionally, we observed often low cluster¬ 
ing coefficient countries were small island countries, that happen to have few vertices and few 
edges, hence the presence of a few triangles (3-cycles) made the clustering coefficient large. 


36 






















CAIDA 1 

CAIDA 2 

NFS 1 

NFS 2 

NFS 3 

NFS 4 

United States 

325,872 

306,171 

314,434 

292,848 

306,988 

314,289 

China 

174,641 

179,043 

239,681 

253,089 

273,374 

280,173 

South Korea 

75,106 

73,005 

133,237 

131,953 

141,314 

144,373 

Japan 

86,226 

87,284 

87,437 

89,684 

93,395 

95,700 

Brazil 

68,526 

64,707 

74,115 

69,335 

76,316 

76,224 

Germany 

63,569 

58,029 

68,667 

64,492 

67,361 

69,180 

Great Britain 

60,580 

56,948 

64,253 

59,939 

61,600 

61,430 

Canada 

64,805 

56,432 

59,141 

50,198 

55,732 

56,580 

Italy 

47,068 

43,011 

42,475 

38,837 

40,879 

42,502 

Spain 

44,646 

37,571 

39,064 

32,272 

34,504 

35,738 

Russian Federation 

48,455 

42,525 

38,778 

32,849 

37,705 

38,176 

Sweden 

30,414 

25,742 

33,919 

32,295 

33,119 

35,144 

France 

38,623 

35,852 

31,458 

26,864 

26,643 

27,285 

Netherlands 

31,746 

26,974 

31,308 

27,757 

30,494 

31,418 


Table 5.7: Sample of the top edge eount by eountry. 



Figure 5.3: The unique monitor loeation per probing eyele. 
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Figure 5.4: The elustering eoeffieient distributions by eountry. 


5.2.1 Country IP Allocation per Graph 

We previously noted the signifieantly larger number of vertex and edge eounts of the NFS 
probing method for diseovering IP addresses within China and South Korea, but still have not 
identified the reason. We already tested the monitor loeation in Seetion 5.2 without any results. 
This led us to analyze known alloeated IP spaee to identify any trends. Table 5.8 displays the 
assigned sorted IP spaee pereentage per eountry to what we diseovered during eaeh probing 
eyele.^^ We normalized eaeh eountry’s probing eyele diseovery by the number of total vertiees 
to have a method of comparison. We did this because the assigned IP space percentages are 
done in a similar fashion by dividing the total assigned IP space for each country by the total 
possible IP space. In Table 5.8, we see the NFS and CAIDA probing methodologies match the 
known percentages of a country’s allocated IP space. Interestingly, none of the probing cycles 
discover the 35 percent that the United States has assigned. However, in all seven cycles the 
United States had the largest discovered IP space. A possible reason for this is the number 
of unresponsive routers, "q" as discussed in Section 4.2.3, that were possibly returned within 

''^List of all countries found Appendix B. 
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the United States. We would suggest additional research in this area to identify probabilistic 
IP/vertex location from the returned "q" for better representation. 



Assigned 
IP Space 

CAIDA 1 

CAIDA 2 

NPS 1 

NPS 2 

NPS 3 

NPS 4 

United States 

35.90% 

14.41% 

14.72% 

16.18% 

16.58% 

16.07% 

16.00% 

China 

7.70% 

6.37% 

6.75% 

8.30% 

8.91% 

8.57% 

8.50% 

Japan 

4.70% 

4.49% 

4.65% 

4.82% 

5.01% 

4.88% 

4.87% 

Great Britain 

2.90% 

2.65% 

2.67% 

2.87% 

2.91% 

2.94% 

2.87% 

Germany 

2.80% 

2.74% 

2.78% 

3.33% 

3.54% 

3.45% 

3.43% 

South Korea 

2.60% 

3.45% 

3.57% 

4.48% 

4.69% 

4.65% 

4.64% 

France 

2.20% 

1.66% 

1.71% 

1.69% 

1.68% 

1.55% 

1.55% 

Canada 

1.90% 

2.46% 

2.48% 

2.56% 

2.52% 

2.60% 

2.60% 

Italy 

1.20% 

1.80% 

1.79% 

1.95% 

1.90% 

1.85% 

1.92% 

Brazil 

1.10% 

2.56% 

2.58% 

2.86% 

2.85% 

2.88% 

2.88% 

Australia 

1.10% 

1.28% 

1.33% 

1.18% 

1.17% 

1.17% 

1.14% 

Netherlands 

1.10% 

1.38% 

1.37% 

1.91% 

2.02% 

2.08% 

2.14% 

Russian Fed¬ 
eration 

1.00% 

2.13% 

2.12% 

1.96% 

1.88% 

1.97% 

1.96% 

Taiwan 

0.80% 

1.39% 

1.40% 

1.12% 

1.11% 

1.16% 

1.15% 

India 

0.80% 

1.15% 

1.10% 

1.10% 

1.01% 

1.03% 

1.02% 


Table 5.8: The top 15 countries of IP space allocation. 


5.2.2 Some results on the diameter and radius for the country graphs 

We analyzed the data to identify if one probing method was better at discovering depth versus 
breadth, previously discussed in Section 3.2. We did this by subtracting the radius from the 
diameter for each graph. Recall from 3.2, the diameter was the longest shortest path, while the 
radius was the smallest eccentricity. Therefore, the higher difference between the diameter and 
the radius, would indicate better breadth into a network, while a smaller number would indicate 
better depth. Unfortunately, to compute the radius and diameter each graph must be connected. 
This was a problem when we parsed our data to the country level, because nearly a third of the 
countries were not connected due to unresponsive routers or other reasons. In Figure 5.5, we 
can see that there is a fairly even distribution with the mean diameter minus radius, /i = .37 
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and most of the countries falling within two standard deviations, (the dashed lines). We also 
annotated the third standard deviation with a solid line and identified the outliers by name. 
Interestingly, the outliers were all island nations. However, we were still able to compare the 
remaining data and we observed that NPS discovered eight percent more breadth than CAIDA. 


NPS (Diameter-Radius) - CAiDA (Diameter-Radius) 



•IS 


_Each dot represents a country value in round 1 _ 

data 

Figure 5.5: The diameter minus the radius (depth versus breadth). 


5.2.3 Case Studies for some Countries 

From our results in Section 5.2, we chose to further analyze China, and South Korea. We also 
will display an issue we discovered when comparing the clustering coefficients per country, 
specifically with Wallis and Futuna. 

Case Study: China 

The NPS probing methodology discovered significantly more vertices and edges than the CAIDA 
probing methodology, as seen in Table 5.6 and Table 5.7. We further explored the data to deter¬ 
mine if the monitor vantage point impacted the results. In Figure 5.6, we notice that the probing 
algorithms used nearly the same monitor vantage points for each probing cycle. We determine 
from this data that the monitor vantage point has little effect on the discovery since nearly the 
same monitor vantage points were used. It is important to note that generally vantage point does 
have a major impact on discovered topology, but our data did not. 
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United States Italy Sweeden Neatherlands Iceland Australia ffN/A 

Monitor Location by Country 

• CAIOARdl ■CAIOARd2 NPSRdl • NPS Rd 2 NPS Rd 3 SNPSRdA 


Figure 5.6: The probing monitors for China. 


Table 5.9 displays the results of the VSD. We identified an average ehange in vertiees of 30-50 
pereent appear between CAIDA 1 and the NPS probing methods. Interestingly the differenee 
between NPS 1 and CAIDA I are 30-52 pereent different even though they were probed nearly 
simultaneously. This further shows how different the probing methods are at diseovering the 
Internet. In Table 5.10, the BSD is highest between CAIDA 1 and the last two NPS probing 
methods. These results show how the different probing methodologies diseover different sets of 
IPs even in smaller subgraphs. It also eonfirms the ever ehanging presenee of the Internet. 



NPS I 

NPS 2 

NPS 3 

NPS 4 

CAIDA I 

CAIDA 2 

NPS 1 

0.00 

0.25 

0.42 

0.42 

0.30 

0.52 

NPS 2 

0.25 

0.00 

0.34 

0.34 

0.36 

0.29 

NPS 3 

0.42 

0.34 

0.00 

0.10 

0.57 

0.50 

NPS 4 

0.42 

0.34 

O.IO 

0.00 

0.57 

0.50 

CAIDA 1 

0.30 

0.36 

0.57 

0.57 

0.00 

0.30 

CAIDA 2 

0.52 

0.29 

0.50 

0.50 

0.30 

0.00 


Table 5.9: The VSD per graph for China. 


We also tried to diseover the existenee of a stable eore within China. In Table 5.11, we eon- 
dueted analysis on the first four graphs (first two NPS and first two CAIDA) as learning and 
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NFS I 

NFS 2 

NFS 3 

NFS 4 

CAIDA I 

CAIDA 2 

NFS 1 

0.00 

0.51 

0.61 

0.61 

0.41 

0.63 

NFS 2 

0.51 

0.00 

0.50 

0.50 

0.53 

0.42 

NFS 3 

0.61 

0.50 

0.00 

0.20 

0.69 

0.59 

NFS 4 

0.61 

0.50 

0.20 

0.00 

0.70 

0.60 

CAIDA I 

0.41 

0.53 

0.69 

0.70 

0.00 

0.49 

CAIDA 2 

0.63 

0.42 

0.59 

0.60 

0.49 

0.00 


Table 5.10: The BSD per graph for China. 


comparing on the remaining two NFS probing cycles (NFS 3 and NFS 4). We identified that 
the intersection of the first four graphs might have identified a possible stable core because 
when we compared to NFS 3 and NFS 4 graphs, the difference was only ten vertices and less 
than a thousand edges. An additional interesting insight is how quickly China went from an 
average vertex count of approx 60,000 (in Table 5.6) to only 5,000 then 2,500 (in Table 5.11). 
This could show how volatile China’s Internet topology is, especially when one compares the 
results to those of South Korea in Section 5.2.3. We hypothesis this may be caused by Chinas 
restrictive Internet, but further work is needed to confirm. 


Graphs 

Node Count 

Edge Count 

Avg Degree 

Clustering 

Fearson 

Both CAIDA Graphs 

5,083 

7,704 

3.031 

O.OII 

0.027 

First Two NFS 

4,722 

6,091 

2.580 

0.015 

-0.086 

2 CAIDA vs 2 NFS 

3,158 

4,175 

2.644 

0.012 

0.020 

Intersection NFS 1-2, CAIDA 1-2 with NFS 3 

2,564 

3,013 

2.350 

O.OII 

0.044 

Intersection NFS 1-2, CAIDA 1-2 with NFS 4 

2,554 

2,987 

2.339 

0.012 

0.053 


Table 5.11: The basic stats of China’s stable core. 


South Korea 

As well. South Korea was interesting because of the difference in discovered vertices and edges 
from the NFS and CAIDA probing methodologies. Table 5.12 displays the VSD results from the 
probing cycles. The trend continues in the difference between the greatest change between the 
CAIDA and the NFS probing methodologies. In Table 5.13, the BSD shows a large difference in 
those edges that are detected per probing cycle, but not as much as China. Again, the large VSD 
and BSD between cycles confirms the ever changing inferred topology. When we conducted the 
stable core testing against the probing cycles for South Korea, the vertex and edge counts of the 
intersection graphs only decrease by approx 40 percent, from average of 37,000 vertices (in 
Table 5.6) to approximately 15,000 vertices (in Table 5.14). This is intriguing because we 
observed a large decrease in the graph to stable core graphs (compared that found in Table 5.6 
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and Table 5.7) of China but not in South Korea. The results might show that South Korea has a 
more stable Internet. 



NFS 1 

NFS 2 

NFS 3 

NFS 4 

CAIDA I 

CAIDA 2 

NFS 1 

0.00 

0.22 

0.30 

0.31 

0.31 

0.38 

NFS 2 

0.22 

0.00 

0.24 

0.24 

0.36 

0.30 

NFS 3 

0.30 

0.24 

0.00 

0.10 

0.41 

0.37 

NFS 4 

0.31 

0.24 

0.10 

0.00 

0.41 

0.37 

CAIDA 1 

0.31 

0.36 

0.41 

0.41 

0.00 

0.31 

CAIDA 2 

0.38 

0.30 

0.37 

0.37 

0.31 

0.00 


Table 5.12: The VSD per graph for South Korea. 



NFS I 

NFS 2 

NFS 3 

NFS 4 

CAIDA I 

CAIDA 2 

NFS I 

0.00 

0.37 

0.46 

0.47 

0.47 

0.57 

NFS 2 

0.37 

0.00 

0.39 

0.39 

0.56 

0.48 

NFS 3 

0.46 

0.39 

0.00 

0.22 

0.62 

0.58 

NFS 4 

0.47 

0.39 

0.22 

0.00 

0.63 

0.58 

CAIDA I 

0.47 

0.56 

0.62 

0.63 

0.00 

0.50 

CAIDA 2 

0.57 

0.48 

0.58 

0.58 

0.50 

0.00 


Table 5.13: The BSD per graph for South Korea. 


Graphs 

Node Count 

Edge Count 

Avg Degree 

Clustering 

Fearson 

Both CAIDA Graphs 

29,706 

83,838 

5.645 

0.006 

0.I5I 

First Two NFS 

21,080 

37,087 

3.519 

0.009 

0.083 

2 CAIDA vs 2 NFS 

17,164 

29,556 

3.444 

0.006 

0.085 

Intersection NFS 1-2, CAIDA 1-2 with NFS 3 

15,436 

24,387 

3.160 

0.005 

0.126 

Intersection NFS 1-2, CAIDA 1-2 with NFS 4 

15,443 

24,296 

3.147 

0.005 

0.126 


Table 5.14: The basic stats of South Korea’s stable core. 


Wallis and Futuna 

In Section 5.2 we discussed how the clustering coefficient could easily skew results if con¬ 
sidered in isolation. We identified Wallis and Futuna, after sorting the data to identify those 
countries with the highest clustering coefficients. We originally conjectured the possibility of 
high clustering coefficient with countries that restrict control of the Internet, however our data 
did not support this conjecture. We found countries, (e.g., Wallis and Futuna in Figure 5.7), that 


43 






I 


Figure 5.7: Graph of Wallis and Futuna with high clustering coefficient. 

had high clustering coefficient, but were not on any lists of known Internet restricted countries. 
Wallis and Futuna is a relatively small nation but it had one of the highest clustering coefficients 
in one of our cycles because of the relatively small number of vertices connected to each other, 
almost a tree, so a few triangles (3-cycles) made a big impact, circled in Figure 5.7. This outlier 
was easy to visualize due to the small number of overall vertices, but larger graphs may not 
easily show this result. We caution the use of clustering coefficient when comparing graphs 
unless you can easily visualize your graph or other measures are used in combination with the 
clustering coefficient (such as density of edges in a graph). 
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CHAPTER 6: 

Future Work and Conclusion 


In this chapter, we present our findings and provide insights into areas that might require further 
research. 


6.1 Summary 

Our goal was to provide an in-depth comparison of the NPS probing methodology and CAIDA 
probing methodology using existing graphical measurements. We also applied the same anal¬ 
ysis to country specific sub-sets to see how the measurements behave with some surprising 
results. There is still an abundance of research that is needed to understand how the Internet is 
connected and its properties. 

6.2 Future Work 

We have barely scraped the surface for using these measurements, and there remains much to 
be explored. The following are some areas for possible future research. 

(1) Perform more statistical analysis on additional runs 

We were limited on the amount of probing cycles we could accomplish during the du¬ 
ration of our research. Making specific claims, such as the Pearson coefficient, about 
how the Internet or algorithms behaves as a result of this study is not statistically sup¬ 
ported. We also only used a limited number of monitor vantage points and suggest a 
more methodical study of how the probing cycles discover topology differ by vantage 
point. Specifically, do certain monitors discover more depending on geographic region, 
or does the geographic distance even affect the return of trace information per destination 
IP. 

(2) Does the stable core of the Internet exist or does a stable core exist in a sub-set of the In¬ 
ternet? Does the existence of a stable core give us any additional information, specifically 
threat vulnerability and assessment? 
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In Chapter 5, we conjectured that the intersections of intersections would produce a stable 
core of vertices that we could use to conduct threat analysis and other network security 
studies. We introduced the intersection of intersections to discover the stable core, but 
did not prove how many graph intersections it would take to confirm the existence of a 
stable core. Additionally, it would be interesting to know if the number of probing cycles 
needed to discover a stable core differ per country or other subset. Specifically, does a 
small set of intersection vertices compared to the total vertex found indicate a censored 
or heavily restricted Internet region, similar to our results for China in Section 5.2.3. 

(3) How many unions of graphs does it take to represent the known IP allocation per country? 

We studied the distribution of vertices discovered per country by the known alloted IP 
space assigned in Section 5.2.1 and noted the disparity of those discovered versus the 
amount allocated. It would be interesting to identify how many probing cycles it would 
take to gather enough data to provide an accurate distribution. 

(4) Identify the number of "q" returns per country and group them with the number of dis¬ 
covered IP vertices. 

We recommend further study of this to identify if the percentage of discovered IPs more 
closely represents that of the known IP allocation by country and further work from [35]. 
The actual location of the "q" is difficult if not impossible to know, but you can infer the 
location of the router that returns the "q" by the preceding and following router locations. 
The researchers propose a possible use of bins for those "q" returned. For example, if a 
trace returned US, US, "q", US, CH, then it would fall into a confident bin for US. Al¬ 
ternatively, if a trace returned (US, US, "q", "q", CH), then the "q"s could be apart of the 
United States or China, therefore a second bin per country for possible "q"s that belong 
to the country. Obviously there would be a lot of inferring to the location in this process, 
and we only suggest this method, while other methods could also work. 

6.3 Conclusion 

We performed numerous known standard graph comparisons in Section 5. Some of the results 
led to additional questions instead of answering our original questions, but we have made some 
headway. The CAIDA and NPS probing methodologies both accomplish the same task of giving 
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an inferred map of the Internet topology, though NPS seems to discover more IPs in less time, 
as suggested in [36], from one probing cycle. In our research, we provided a concise study for 
a better understanding about the inferred representation of each of the probing methodologies. 
The researchers also offer additional directions to provide better insight into the inferred topol¬ 
ogy. Specifically, we hope our conjecture of a stable core, (see Section 5.2.3 and Section 5.2.3) 
could prove impactful. 

One of the greatest challenges to this study was (1) the sheer size of the data sets^^ that we 
had to first strip into usable forms, and (2) needing a dedicated server to run the analytical 
analysis. Internet topology is not a topic that will have a simple solution from a single paper, but 
continued work to understand how components link and share the information on the Internet 
can prove insightful. 

We also believe some of the metrics we used could be useful in other complex networks, such as 
biological and social networks. It could prove insightful to understand the most logical next step 
or underlying structure or the existence of a stable core or backbone for information distribution. 


^^Typical size of graph comparisons used six to ten Gigabyte (GB) of RAM per test for upwards of 14 days. 
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APPENDIX A: 

SC Analysis Dump Format 


# This file contains an ASCII representation of the IPv4 paths stored in 

# the binary skitter arts++ and scamper warts file formats. 

# 

# This ASCII file format is in the sk_analysis_dump text output 

# format: imdc.datcat.org/format/l-003W-7 

# 

# There is one trace per line, with the following tab-separated fields: 


# 

# 

# 1. Key -- Indicates the type of line and determines the meaning of the 

# remaining fields. This will always be ’T’ for an IP trace. 

# 

# - Header Fields - 

# 

# 2. Source -- Source IP of skitter/scamper monitor performing the trace. 

# 


# 3. Destination -- Destination IP being traced. 

# 

# 4. Listid -- ID of the destination list containing this destination 

# address. 

# 

# This value will be zero if no list ID was provided. (uint32_t) 

# 

# 5. Cycleld -- ID of current probing cycle (a cycle is a single run 

# through a given list). For skitter traces, cycle IDs 

# will be equal to or slightly earlier than the timestamp 

# of the first trace in each cycle. There is no standard 

# interpretation for scamper cycle IDs. 


49 







# 

# This value will be zero if no cycle ID was provided. (uint32_t) 

# 

# 6. Timestamp -- Timestamp when trace began to this destination. 

# 

# - Reply Fields - 

# 

# 7. DestReplied -- Whether a response from the destination was received. 

# 


# R - Replied, reply was received 

# N - Not-replied, no reply was received; 

# Since skitter sends a packet with a TTL of 255 when it halts 

# probing, it is still possible for the final destination to 

# send a reply and for the HaltReasonData (see below) to not 

# equal no_halt. Note: scamper does not perform this last-ditch 

# probing at TTL 255. 

# 

# 8. DestRTT -- RTT (ms) of first response packet from destination. 

# 0 if DestReplied is N. 

# 

# 9. RequestTTL -- TTL set in request packet which elicited a response 

# (echo reply) from the destination. 

# 0 if DestReplied is N. 

# 

# 10. ReplyTTL -- TTL found in reply packet from destination; 

# 0 if DestReplied is N. 

# 

#-Halt Fields- 


# 

# 11. HaltReason -- The reason, if any, why incremental probing stopped. 

# 

# 12. HaltReasonData -- Extra data about why probing halted. 

# 

# HaltReason HaltReasonData 
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# - 

# S (success/no_halt) 0 

# U (icmp_unreachable) icmp_code 

# L (loop_detacted) loop_length 

# G (gap_detected) gap_limit 

# 

# -Path Fields - 

# 


# 13. PathComplete -- Whether all hops to destination were found. 

# 

# C - Complete, all hops found 

# I - Incomplete, at least one hop is missing (i.e., did not 

# respond) 

# 

# 14. PerHopData -- Response data for the first hop. 

# 

# If multiple IP addresses respond at the same hop, response data 

# for each IP address are separated by semicolons: 

# 

# IP,RTT,numTries (for only one responding IP) 

# IP,RTT,numTries;IP,RTT,numTries;... (for multiple responding IPs) 

# 

# where 

# 

# IP -- IP address which sent a TTL expired packet 

# RTT -- RTT of the TTL expired packet 

# num_tries -- num tries before response received from TTL. 

# 

# This field will have the value ’q’ if there was no response at 

# this hop. 

# 

# 15. PerHopData -- Response data for the second hop in the same format 

# as field 14. 

# 
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# ... 

# 

# N. PerHopData -- Response data for the destination 

# (if destination replied). 

# 
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APPENDIX B: 

P Space by Country 


COUNTRY 

CAIDA 1 

CAIDA 2 

NPSl 

NPS2 

NPS 3 

NPS 4 

Afghanistan 

0.074% 

0.081% 

0.042% 

0.026% 

0.028% 

0.026% 

Albania 

0.082% 

0.083% 

0.037% 

0.042% 

0.053% 

0.053% 

Algeria 

0.292% 

0.262% 

0.147% 

0.067% 

0.080% 

0.083% 

American Samoa 

0.009% 

0.010% 

0.013% 

0.006% 

0.016% 

0.013% 

Andorra 

0.022% 

0.024% 

0.003% 

0.007% 

0.006% 

0.006% 

Angola 

0.129% 

0.116% 

0.074% 

0.090% 

0.095% 

0.095% 

Anguilla 

0.006% 

0.008% 

0.002% 

#N/A 

0.007% 

0.003% 

Antarctica 

0.030% 

0.035% 

0.002% 

0.003% 

0.004% 

0.004% 

Antigua and Barbuda 

0.024% 

0.024% 

0.005% 

0.012% 

0.008% 

0.007% 

Argentina 

0.669% 

0.660% 

0.707% 

0.661% 

0.650% 

0.640% 

Armenia 

0.157% 

0.159% 

0.103% 

0.098% 

0.129% 

0.122% 

Aruba 

0.030% 

0.028% 

0.004% 

0.005% 

0.002% 

0.002% 

Australia 

1.282% 

1.331% 

1.180% 

1.168% 

1.170% 

1.135% 

Austria 

0.756% 

0.769% 

0.697% 

0.701% 

0.699% 

0.722% 

Azerbaidjan 

0.132% 

0.123% 

0.058% 

0.054% 

0.069% 

0.069% 

Bahamas 

0.129% 

0.133% 

0.056% 

0.055% 

0.064% 

0.062% 

Bahrain 

0.171% 

0.181% 

0.146% 

0.140% 

0.154% 

0.150% 

Bangladesh 

0.232% 

0.231% 

0.219% 

0.193% 

0.213% 

0.209% 

Barbados 

0.066% 

0.065% 

0.080% 

0.062% 

0.077% 

0.076% 

Belarus 

0.141% 

0.170% 

0.126% 

0.118% 

0.135% 

0.132% 

Belgium 

0.643% 

0.637% 

0.500% 

0.461% 

0.533% 

0.536% 

Belize 

0.169% 

0.172% 

0.069% 

0.033% 

0.040% 

0.032% 

Benin 

0.011% 

0.011% 

0.005% 

0.007% 

0.004% 

0.004% 

Bermuda 

0.124% 

0.130% 

0.108% 

0.118% 

0.120% 

0.113% 

Bhutan 

0.047% 

0.050% 

0.094% 

0.080% 

0.090% 

0.087% 

Bolivia 

0.191% 

0.180% 

0.261% 

0.230% 

0.212% 

0.211% 

Bosnia-Herzegovina 

0.264% 

0.256% 

0.309% 

0.314% 

0.303% 

0.305% 

Botswana 

0.053% 

0.051% 

0.042% 

0.048% 

0.039% 

0.038% 


53 





Brazil 

2.555% 

2.582% 

2.855% 

2.851% 

2.880% 

2.876% 

British Indian Ocean Territory 

0.004% 

0.004% 

#N/A 

#N/A 

#N/A 

#N/A 

Brunei Darussalam 

0.074% 

0.075% 

0.037% 

0.035% 

0.035% 

0.036% 

Bulgaria 

0.547% 

0.530% 

0.566% 

0.511% 

0.521% 

0.515% 

Burkina Faso 

0.042% 

0.051% 

0.097% 

0.075% 

0.090% 

0.089% 

Burundi 

0.008% 

0.006% 

#N/A 

#N/A 

0.001% 

0.001% 

Cambodia 

0.084% 

0.086% 

0.061% 

0.076% 

0.109% 

0.112% 

Cameroon 

0.075% 

0.075% 

0.080% 

0.054% 

0.060% 

0.064% 

Canada 

2.459% 

2.483% 

2.555% 

2.524% 

2.598% 

2.605% 

Cape Verde 

0.010% 

0.011% 

0.013% 

0.010% 

0.020% 

0.018% 

Cayman Islands 

0.043% 

0.043% 

0.029% 

0.029% 

0.034% 

0.030% 

Central African Republic 

0.003% 

0.003% 

#N/A 

#N/A 

#N/A 

#N/A 

Chad 

0.004% 

0.002% 

0.002% 

#N/A 

0.001% 

0.002% 

Chile 

0.539% 

0.540% 

0.399% 

0.382% 

0.383% 

0.390% 

China 

6.370% 

6.752% 

8.303% 

8.915% 

8.571% 

8.504% 

Christmas Island 

0.002% 

0.002% 

#N/A 

#N/A 

#N/A 

#N/A 

Cocos (Keeling) Islands 

0.003% 

0.004% 

#N/A 

#N/A 

#N/A 

#N/A 

Colombia 

0.607% 

0.575% 

0.600% 

0.587% 

0.552% 

0.557% 

Comoros 

0.016% 

0.013% 

0.005% 

0.007% 

0.008% 

0.018% 

Congo 

0.023% 

0.020% 

0.024% 

0.017% 

0.011% 

0.016% 

Cook Islands 

0.026% 

0.025% 

0.028% 

0.033% 

0.049% 

0.047% 

Costa Rica 

0.376% 

0.321% 

0.204% 

0.193% 

0.165% 

0.161% 

Croatia 

0.400% 

0.340% 

0.335% 

0.294% 

0.283% 

0.284% 

Cuba 

0.099% 

0.108% 

0.063% 

0.035% 

0.037% 

0.036% 

Cyprus 

0.212% 

0.196% 

0.233% 

0.191% 

0.167% 

0.168% 

Czech Republic 

1.046% 

1.033% 

1.197% 

1.177% 

1.178% 

1.211% 

Denmark 

0.804% 

0.773% 

0.899% 

0.908% 

0.940% 

0.941% 

Djibouti 

0.041% 

0.075% 

0.013% 

0.013% 

0.013% 

0.013% 

Dominica 

0.023% 

0.029% 

0.017% 

0.010% 

0.015% 

0.015% 

Dominican Republic 

0.263% 

0.252% 

0.121% 

0.104% 

0.115% 

0.109% 

Ecuador 

0.294% 

0.301% 

0.354% 

0.335% 

0.324% 

0.312% 

Egypt 

0.773% 

0.709% 

0.372% 

0.379% 

0.349% 

0.345% 
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El Salvador 

0.231% 

0.220% 

0.105% 

0.121% 

0.119% 

0.114% 

Equatorial Guinea 

0.030% 

0.028% 

0.010% 

0.026% 

0.025% 

0.029% 

Eritrea 

0.003% 

0.005% 

#N/A 

#N/A 

#N/A 

#N/A 

Estonia 

0.332% 

0.309% 

0.509% 

0.484% 

0.486% 

0.486% 

Ethiopia 

0.089% 

0.087% 

0.050% 

0.032% 

0.044% 

0.043% 

Ealkland Islands 

0.006% 

0.009% 

0.005% 

0.006% 

0.031% 

0.029% 

Earoe Islands 

0.071% 

0.069% 

0.053% 

0.048% 

0.044% 

0.044% 

Fiji 

0.129% 

0.122% 

0.049% 

0.051% 

0.053% 

0.046% 

Einland 

0.722% 

0.704% 

0.574% 

0.547% 

0.546% 

0.543% 

Erance 

1.662% 

1.713% 

1.687% 

1.680% 

1.548% 

1.546% 

Erance (European Territory) 

0.003% 

0.002% 

0.002% 

#N/A 

0.001% 

0.001% 

Erench Guyana 

0.077% 

0.076% 

0.019% 

0.020% 

0.022% 

0.021% 

Erench Southern Territories 

0.004% 

0.002% 

#N/A 

#N/A 

#N/A 

#N/A 

Gabon 

0.152% 

0.158% 

0.050% 

0.062% 

0.070% 

0.069% 

Gambia 

0.002% 

0.004% 

#N/A 

#N/A 

#N/A 

#N/A 

Georgia 

0.249% 

0.252% 

0.339% 

0.308% 

0.318% 

0.331% 

Germany 

2.741% 

2.779% 

3.329% 

3.536% 

3.448% 

3.429% 

Ghana 

0.142% 

0.166% 

0.065% 

0.047% 

0.116% 

0.115% 

Gibraltar 

0.017% 

0.014% 

0.013% 

0.016% 

0.014% 

0.013% 

Great Britain 

2.653% 

2.673% 

2.867% 

2.912% 

2.935% 

2.867% 

Greece 

0.623% 

0.556% 

0.517% 

0.504% 

0.488% 

0.495% 

Greenland 

0.035% 

0.035% 

0.026% 

0.060% 

0.027% 

0.043% 

Grenada 

0.028% 

0.030% 

0.031% 

0.017% 

0.030% 

0.025% 

Guadeloupe (Erench) 

0.132% 

0.143% 

0.089% 

0.092% 

0.089% 

0.088% 

Guam (USA) 

0.144% 

0.133% 

0.104% 

0.099% 

0.100% 

0.093% 

Guatemala 

0.155% 

0.152% 

0.168% 

0.095% 

0.172% 

0.182% 

Guinea 

0.045% 

0.040% 

0.023% 

0.024% 

0.032% 

0.020% 

Guinea Bissau 

0.003% 

0.002% 

#N/A 

#N/A 

0.001% 

0.001% 

Guyana 

0.017% 

0.019% 

0.048% 

0.022% 

0.023% 

0.023% 

Haiti 

0.106% 

0.079% 

0.041% 

0.048% 

0.043% 

0.044% 

Honduras 

0.176% 

0.170% 

0.095% 

0.103% 

0.102% 

0.100% 

Hong Kong 

0.854% 

0.872% 

1.071% 

1.056% 

1.040% 

1.043% 
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Hungary 

0.706% 

0.680% 

0.643% 

0.679% 

0.652% 

0.656% 

Iceland 

0.294% 

0.260% 

0.194% 

0.195% 

0.189% 

0.189% 

India 

1.147% 

1.096% 

1.101% 

1.011% 

1.031% 

1.022% 

Indonesia 

0.515% 

0.489% 

0.508% 

0.493% 

0.532% 

0.541% 

Iran 

0.679% 

0.684% 

0.498% 

0.500% 

0.480% 

0.503% 

Iraq 

0.226% 

0.243% 

0.123% 

0.104% 

0.131% 

0.129% 

Ireland 

0.606% 

0.609% 

0.594% 

0.559% 

0.583% 

0.563% 

Israel 

0.631% 

0.608% 

0.613% 

0.580% 

0.582% 

0.591% 
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